AITopics | levantine arabic

Collaborating Authors

levantine arabic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models

Bafna, Niyati, Chang, Emily, Robinson, Nathaniel R., Mortensen, David R., Murray, Kenton, Yarowsky, David, Sirin, Hale

arXiv.org Artificial IntelligenceJan-27-2025

Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectical variation and cross-lingual generalization to the HRL dialect continuum. We present DialUp, consisting of a training-time technique for adapting a pretrained model to dialectical data (M->D), and an inference-time intervention adapting dialectical data to the model expertise (D->M). M->D induces model robustness to potentially unseen and unknown dialects by exposure to synthetic data exemplifying linguistic mechanisms of dialectical variation, whereas D->M treats dialectical divergence for known target dialects. These methods show considerable performance gains for several dialects from four language families, and modest gains for two other language families. We also conduct feature and error analyses, which show that language varieties with low baseline MT performance are more likely to benefit from these approaches.

large language model, latn, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.16581

Country:

Europe > Sweden (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.05)
(21 more...)

Genre:

Personal > Honors (0.46)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection

Ahmed, Ahmed Haj, Yew, Rui-Jie, Minocher, Xerxes, Venkatasubramanian, Suresh

arXiv.org Artificial IntelligenceDec-14-2024

Social media platforms have become central to global communication, yet they also facilitate the spread of hate speech. For underrepresented dialects like Levantine Arabic, detecting hate speech presents unique cultural, ethical, and linguistic challenges. This paper explores the complex sociopolitical and linguistic landscape of Levantine Arabic and critically examines the limitations of current datasets used in hate speech detection. We highlight the scarcity of publicly available, diverse datasets and analyze the consequences of dialectal bias within existing resources. By emphasizing the need for culturally and contextually informed natural language processing (NLP) tools, we advocate for a more nuanced and inclusive approach to hate speech detection in the Arab world.

artificial intelligence, levantine arabic, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.10991

Country:

Asia > Middle East > Syria > Damascus Governorate > Damascus (0.05)
Asia > Middle East > Jordan (0.05)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.05)
(10 more...)

Genre: Research Report (0.50)

Industry: Government (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

Faisal, Fahim, Ahia, Orevaoghene, Srivastava, Aarohi, Ahuja, Kabir, Chiang, David, Tsvetkov, Yulia, Anastasopoulos, Antonios

arXiv.org Artificial IntelligenceJul-7-2024

Language technologies should be judged on their usefulness in real-world use cases. An often overlooked aspect in natural language processing (NLP) research and evaluation is language variation in the form of non-standard dialects or language varieties (hereafter, varieties). Most NLP benchmarks are limited to standard language varieties. To fill this gap, we propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties, which aggregates an extensive set of task-varied variety datasets (10 text-level tasks covering 281 varieties). This allows for a comprehensive evaluation of NLP system performance on different language varieties. We provide substantial evidence of performance disparities between standard and non-standard language varieties, and we also identify language clusters with large performance divergence across tasks. We believe DIALECTBENCH provides a comprehensive view of the current state of NLP for language varieties and one step towards advancing it further. Code/data: https://github.com/ffaisal93/DialectBench

basque, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2403.11009

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
(58 more...)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.46)

Add feedback